Skip to content

PromQL: Wire WITHOUT in Translator#144261

Closed
sidosera wants to merge 8 commits intoelastic:mainfrom
sidosera:feature/promql-without-grouping-v2
Closed

PromQL: Wire WITHOUT in Translator#144261
sidosera wants to merge 8 commits intoelastic:mainfrom
sidosera:feature/promql-without-grouping-v2

Conversation

@sidosera
Copy link
Contributor

@sidosera sidosera commented Mar 14, 2026

Adds the translation flow for PromQL without grouping as part of #139793.

Summary:

Stack:

@elasticsearchmachine elasticsearchmachine added v9.4.0 needs:triage Requires assignment of a team area label labels Mar 14, 2026
@sidosera sidosera marked this pull request as draft March 14, 2026 02:42
@sidosera sidosera added Team:StorageEngine :StorageEngine/PromQL PromQL support for Elastic >non-issue and removed needs:triage Requires assignment of a team area label labels Mar 14, 2026
@sidosera sidosera changed the title PromQL: Wire WITHOUT translation flow PromQL: Wire WITHOUT in Translator Mar 14, 2026
@sidosera sidosera requested a review from felixbarny March 14, 2026 03:22
@sidosera sidosera marked this pull request as ready for review March 18, 2026 23:59
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

Copy link
Member

@felixbarny felixbarny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not though the PR yet but some early comments here. This PR is now also quite large again.

// For WITHOUT the concrete labels are only determined during translation (they depend on which labels
// the data actually has minus the excluded set), so we must resolve them against the translated plan.
if (result.hasExcludedLabels()) {
plan = new Project(promqlCommand.source(), plan, resolveOutput(promqlCommand, plan, result));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolution is an overloaded term here. The analyzer has a resolution phase where it resolves tables/functions/attributes etc. We should probably use a different term here.

return Literal.timeDuration(promqlCommand.source(), Duration.ofMillis(Math.max(1L, nextRoundedValue - roundedStart)));
}

/**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For most of these helper methods, there's potential to simplify by using the streams API rather than manual/imperative loops and filtering.

projections.add(stepAttr);

return new Aggregate(promqlCommand.source(), plan, groupings, aggs);
List<Attribute> expectedExtra = promqlCommand.output().subList(2, promqlCommand.output().size());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expectedExtra sounds a bit cryptic. These are the dimension columns, right? Maybe add a short method-level comment that describes the column layout like [value, step, ...dimensions].

return new PackedGrouping(new Project(ctx.promqlCommand().source(), rewritten, projections), packedAttr);
}

private static List<NamedExpression> resolveOutput(PromqlCommand promqlCommand, LogicalPlan plan, TranslationResult result) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A high level description of what this does and what the purpose is would be useful.

assertThat(tsAggregate.timeBucket().buckets().fold(FoldContext.small()), equalTo(Duration.ofMinutes(10)));
}

public void testWithoutGroupingProducesTimeSeriesOutput() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests can be removed now, right? They live in PromqlPlanWithoutGroupingTests now.

Copy link
Member

@felixbarny felixbarny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic is sound but the code reads as a sequence of "what" rather than "why" and I found it hard to follow, even though I have an understanding what this is trying to do at a high level. The biggest single improvement would be a concise top-level explanation of the label-propagation model, followed by splitting createOuterAggregate into its two distinct cases. Maybe there are more things we can do to make it more readable.

@@ -217,20 +267,56 @@ private TranslationResult translateNode(LogicalPlan node, LogicalPlan currentPla
* expressions embedded inside the aggregation, but do not create Aggregate plan nodes themselves.
*/
private TranslationResult translateAcrossSeriesAggregate(AcrossSeriesAggregate agg, LogicalPlan currentPlan, TranslationContext ctx) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole PR is built around a bidirectional label-propagation scheme — requiredLabels flows down, exposedLabels/excludedLabels flow up — but this model is never stated explicitly. A developer encountering this code cold has to reverse-engineer it from the individual pieces. A single Javadoc comment at the top of translateAcrossSeriesAggregate explaining the two-pass propagation strategy and why it's needed for WITHOUT would be worth more than all the inline comments combined.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion. Added doc section for that.

}
}

// TODO: Should we fail in case of missmatch?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// TODO: Should we fail in case of missmatch?
// TODO: Should we fail in case of mismatch?

In what cases does a mismatch happen? Can we validate this early and throw a validation error (4xx) rather than failing here (5xx)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably surface this in some form, but it is not straightforward as far as I can tell because PromQL translation happens after analysis. For example, in:

avg by (c) (
      sum without (b) (
         max by (a,b) (...)
      )
  )

we can tell that the result is meaningless because c cannot appear in the output of the without stage since innermost doesn't produce it. This is likely typo and warning would be a good UX (not necessary fail query if we want to stay compliant with PromQL spec).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's track an issue for this. Also remember that 5xx will cause serverless alerts.

resolved.add(byName);
}
}
return normalizeLabels(resolved);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Some callers pass in the output of prior `normalizeLabels(...) calls. This is low overhead but slightly redundant.

/**
* Strips non-grouping attributes (metadata, unresolved, metrics, NULLs) and deduplicates by field name.
*/
private static List<Attribute> normalizeLabels(List<Attribute> attributes) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling this for agg.output() seems redundant as nulls should be filtered out already. For agg.groupings() it seems we don't need the full normalization but just the null filtering?

The only place where all of the normalization is needed is normalizeLabels(resolved) inside resolveLabels(). Maybe also rename to something less vague than "normalize". Maybe something like retainLabelAttributes.

@sidosera
Copy link
Contributor Author

The logic is sound but the code reads as a sequence of "what" rather than "why" and I found it hard to follow, even though I have an understanding what this is trying to do at a high level. The biggest single improvement would be a concise top-level explanation of the label-propagation model, followed by splitting createOuterAggregate into its two distinct cases. Maybe there are more things we can do to make it more readable.

Thanks for the feedback @felixbarny. I agree, we may want to add diagrams and examples to illustrate label propagation model. I took a pass and added one high level summary above createOuterAggregate function. I also addressed your broader concern wrt readability in follow-up PR #144261. LMK if this makes sense now.

@sidosera sidosera requested a review from felixbarny March 26, 2026 10:49
}
}

// TODO: Should we fail in case of missmatch?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's track an issue for this. Also remember that 5xx will cause serverless alerts.

Stage the capability flag, lowering prep, and gated coverage needed for
incremental WITHOUT support while keeping translator behavior unchanged.
Wire WITHOUT grouping through selector translation, aggregate planning,
and root projection so the staged plumbing becomes functional.
Enable the capability and focused verifier/optimizer coverage for the
supported WITHOUT cases.
Upstream moved STEP_COLUMN_NAME to PromqlCommand; use the accessor.
@sidosera sidosera requested review from a team as code owners March 26, 2026 15:13
@sidosera sidosera requested a review from a team as a code owner March 26, 2026 15:13
@sidosera sidosera requested a review from jeramysoucy March 26, 2026 15:13
@sidosera sidosera force-pushed the feature/promql-without-grouping-v2 branch from 2350951 to 4fd76d4 Compare March 26, 2026 15:17
@jeramysoucy
Copy link

@sidosera It doesn't look like Kibana Security owns any changes here though we were triggered as code owner. I will remove the request for our team, but I am still happy to review if there is something you'd like us to look at.

@jeramysoucy jeramysoucy removed request for a team and jeramysoucy March 26, 2026 15:29
@sidosera sidosera removed request for a team March 26, 2026 17:43
@sidosera sidosera enabled auto-merge (squash) March 26, 2026 17:43
@shainaraskas shainaraskas removed the request for review from a team March 26, 2026 19:52
@sidosera
Copy link
Contributor Author

Closing this as future refactoring PR landed containing this change.

@sidosera sidosera closed this Mar 27, 2026
auto-merge was automatically disabled March 27, 2026 12:51

Pull request was closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants